Current approaches for fixing systematic problems in NLP models (e.g. regex patches, finetuning on more data) are either brittle, or labor-intensive and liable to shortcuts. In contrast, humans often provide corrections to each other through natural language. Taking inspiration from this, we explore natural language patches -- declarative statements that allow developers to provide corrective feedback at the right level of abstraction, either overriding the model (``if a review gives 2 stars, the sentiment is negative'') or providing additional information the model may lack (``if something is described as the bomb, then it is good''). We model the task of determining if a patch applies separately from the task of integrating patch information, and show that with a small amount of synthetic data, we can teach models to effectively use real patches on real data -- 1 to 7 patches improve accuracy by ~1-4 accuracy points on different slices of a sentiment analysis dataset, and F1 by 7 points on a relation extraction dataset. Finally, we show that finetuning on as many as 100 labeled examples may be needed to match the performance of a small set of language patches.
translated by 谷歌翻译
When trained on language data, do transformers learn some arbitrary computation that utilizes the full capacity of the architecture or do they learn a simpler, tree-like computation, hypothesized to underlie compositional meaning systems like human languages? There is an apparent tension between compositional accounts of human language understanding, which are based on a restricted bottom-up computational process, and the enormous success of neural models like transformers, which can route information arbitrarily between different parts of their input. One possibility is that these models, while extremely flexible in principle, in practice learn to interpret language hierarchically, ultimately building sentence representations close to those predictable by a bottom-up, tree-structured model. To evaluate this possibility, we describe an unsupervised and parameter-free method to \emph{functionally project} the behavior of any transformer into the space of tree-structured networks. Given an input sentence, we produce a binary tree that approximates the transformer's representation-building process and a score that captures how "tree-like" the transformer's behavior is on the input. While calculation of this score does not require training any additional models, it provably upper-bounds the fit between a transformer and any tree-structured approximation. Using this method, we show that transformers for three different tasks become more tree-like over the course of training, in some cases unsupervisedly recovering the same trees as supervised parsers. These trees, in turn, are predictive of model behavior, with more tree-like models generalizing better on tests of compositional generalization.
translated by 谷歌翻译
To build general robotic agents that can operate in many environments, it is often imperative for the robot to collect experience in the real world. However, this is often not feasible due to safety, time, and hardware restrictions. We thus propose leveraging the next best thing as real-world experience: internet videos of humans using their hands. Visual priors, such as visual features, are often learned from videos, but we believe that more information from videos can be utilized as a stronger prior. We build a learning algorithm, VideoDex, that leverages visual, action, and physical priors from human video datasets to guide robot behavior. These actions and physical priors in the neural network dictate the typical human behavior for a particular robot task. We test our approach on a robot arm and dexterous hand-based system and show strong results on various manipulation tasks, outperforming various state-of-the-art methods. Videos at https://video-dex.github.io
translated by 谷歌翻译
Recently, automated co-design of machine learning (ML) models and accelerator architectures has attracted significant attention from both the industry and academia. However, most co-design frameworks either explore a limited search space or employ suboptimal exploration techniques for simultaneous design decision investigations of the ML model and the accelerator. Furthermore, training the ML model and simulating the accelerator performance is computationally expensive. To address these limitations, this work proposes a novel neural architecture and hardware accelerator co-design framework, called CODEBench. It is composed of two new benchmarking sub-frameworks, CNNBench and AccelBench, which explore expanded design spaces of convolutional neural networks (CNNs) and CNN accelerators. CNNBench leverages an advanced search technique, BOSHNAS, to efficiently train a neural heteroscedastic surrogate model to converge to an optimal CNN architecture by employing second-order gradients. AccelBench performs cycle-accurate simulations for a diverse set of accelerator architectures in a vast design space. With the proposed co-design method, called BOSHCODE, our best CNN-accelerator pair achieves 1.4% higher accuracy on the CIFAR-10 dataset compared to the state-of-the-art pair, while enabling 59.1% lower latency and 60.8% lower energy consumption. On the ImageNet dataset, it achieves 3.7% higher Top1 accuracy at 43.8% lower latency and 11.2% lower energy consumption. CODEBench outperforms the state-of-the-art framework, i.e., Auto-NBA, by achieving 1.5% higher accuracy and 34.7x higher throughput, while enabling 11.0x lower energy-delay product (EDP) and 4.0x lower chip area on CIFAR-10.
translated by 谷歌翻译
Running machine learning inference on tiny devices, known as TinyML, is an emerging research area. This task requires generating inference code that uses memory frugally, a task that standard ML frameworks are ill-suited for. A deployment framework for TinyML must be a) parametric in the number representation to take advantage of the emerging representations like posits, b) carefully assign high-precision to a few tensors so that most tensors can be kept in low-precision while still maintaining model accuracy, and c) avoid memory fragmentation. We describe MinUn, the first TinyML framework that holistically addresses these issues to generate efficient code for ARM microcontrollers (e.g., Arduino Uno, Due and STM32H747) that outperforms the prior TinyML frameworks.
translated by 谷歌翻译
State-of-the-art algorithms for Approximate Nearest Neighbor Search (ANNS) such as DiskANN, FAISS-IVF, and HNSW build data dependent indices that offer substantially better accuracy and search efficiency over data-agnostic indices by overfitting to the index data distribution. When the query data is drawn from a different distribution - e.g., when index represents image embeddings and query represents textual embeddings - such algorithms lose much of this performance advantage. On a variety of datasets, for a fixed recall target, latency is worse by an order of magnitude or more for Out-Of-Distribution (OOD) queries as compared to In-Distribution (ID) queries. The question we address in this work is whether ANNS algorithms can be made efficient for OOD queries if the index construction is given access to a small sample set of these queries. We answer positively by presenting OOD-DiskANN, which uses a sparing sample (1% of index set size) of OOD queries, and provides up to 40% improvement in mean query latency over SoTA algorithms of a similar memory footprint. OOD-DiskANN is scalable and has the efficiency of graph-based ANNS indices. Some of our contributions can improve query efficiency for ID queries as well.
translated by 谷歌翻译
在边缘计算中,必须根据用户移动性迁移用户的服务配置文件。已经提出了强化学习(RL)框架。然而,这些框架并不考虑偶尔的服务器故障,尽管很少会阻止Edge Computing用户的延迟敏感应用程序(例如自动驾驶和实时障碍物检测)的平稳和安全功能,因为用户的计算作业不再是完全的。由于这些故障的发生率很低,因此,RL算法本质上很难为数据驱动的算法学习针对典型事件和罕见事件方案的最佳服务迁移解决方案。因此,我们引入了罕见的事件自适应弹性框架火,该框架将重要性采样集成到加强学习中以放置备份服务。我们以与其对价值函数的贡献成正比的稀有事件进行采样,以学习最佳政策。我们的框架平衡了服务迁移和迁移成本之间的迁移权衡,与失败的成本以及备份放置和移民的成本。我们提出了一种基于重要性抽样的Q-学习算法,并证明其界限和收敛到最佳性。随后,我们提出了新的资格轨迹,我们的算法的线性函数近似和深Q学习版本,以确保其扩展到现实世界情景。我们扩展框架,以适应具有不同风险承受失败的用户。最后,我们使用痕量驱动的实验表明我们的算法在发生故障时会降低成本。
translated by 谷歌翻译
最后,这项工作将包括对解释的上下文形式的调查。在这项研究中,我们将包括一个时间障碍的方案,其中将测试不同水平的理解水平,以使我们能够评估合适且可理解的解释。为此,我们提出了不同的理解水平(lou)。用户研究将旨在比较不同的LOU在不同的互动环境中。将研究同时医院环境的用户研究。
translated by 谷歌翻译
在软件开发过程中,开发人员需要回答有关代码语义方面的查询。即使已经用神经方法进行了广泛的自然语言研究,但尚未探索使用神经网络对代码回答语义查询的问题。这主要是因为没有现有的数据集,具有提取性问答和答案对,涉及复杂概念和较长推理的代码。我们通过构建一个名为Codequeries的新的,策划的数据集并提出了一种关于代码的神经问题方法来弥合这一差距。我们基于最先进的预训练的代码模型,以预测答案和支持事实跨度。给定查询和代码,只有一些代码可能与回答查询有关。我们首先在理想的环境下进行实验,其中仅给出了模型的相关代码,并表明我们的模型做得很好。然后,我们在三个务实的考虑因素下进行实验:(1)扩展到大尺寸的代码,(2)从有限数量的示例中学习,(3)代码中对次要语法错误的鲁棒性。我们的结果表明,虽然神经模型可以抵御代码中的次要语法错误,代码的大小增加,与查询无关的代码的存在以及减少的培训示例数量限制了模型性能。我们正在释放数据和模型,以促进未来关于回答代码语义查询的问题的工作。
translated by 谷歌翻译
近年来,视觉模型(VLMS)在视觉推理任务(例如属性,位置)上表现出色。尽管这些任务衡量了必要的知识以在给定的视觉实例上进行理性和理性,但是它们却不能衡量VLMS保留和概括此类知识的能力。在这项工作中,我们评估了他们获得“可见”物理知识的能力 - 从静态场景的图像,尤其是在对象颜色,大小和空间的尺寸的范围内,可以轻松访问的信息。我们建立了自动管道,以获取综合知识资源,用于校准和探索这些模型。我们的结果表明,在所有三个任务中,模型和人类绩效之间存在严重的差距。此外,我们的字幕预测的基线(Capbert)在大小和空间任务上都大大优于VLM,这强调了,尽管有足够的视觉方式访问了地面语言,但它们仍在努力保留此类知识。数据集和代码可在https://github.com/axe--/viphy上找到。
translated by 谷歌翻译